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to  make  two  evaluations:  (1)  We  must 
assess  the  performance  based  impact  of  a 
change,  and  (2)  We  must  determine  what 
level  of  testing  is  required  for  the 
assessed  performance  impact. 


The  addition  of  field  changes,  patches, 
updates  and  upgrades  at  regular,  short¬ 
term  intervals  makes  the  acquisition  of 
software  intensive  systems  complex  at 
best.  Is  the  system  configuration  as 
stable  today  in  test  as  it  was  six  months 
ago  at  the  last  software  review?  Is 
today's  test  item  representative  of  the 
item  that  will  be  delivered  to  the 
servicemember  in  the  field  a  year  from 
now? 

Traditional  software  metrics  fall  short  of 
providing  useful  answers  to  these 
questions.  SLOC,  version  numbers,  and 
release  dates  describe  the  physical 
characteristics  of  an  application.  The 
questions  that  most  often  arise  in  the 
management  of  a  software  program, 
however,  need  information  on  the 
operational  performance  of  the 
application.  To  answer  this  type  of 
question  in  a  meaningful  way,  we  need 


This  paper  has  three  main  points:  (1) 
describe  a  framework  and  a  set  of  terms 
of  reference  for  discussing  the 
operational  impact  of  change  in  a 
software  application,  (2)  Provide  an 
example  of  applying  this  framework  to 
develop  change  metrics  for  a  specific 
problem,  and  (3)  illustrate  the  evaluation 
of  these  metrics  to  make  a  programmatic 
decision. 


Problem  Statement 

The  specific  problem  under  discussion 
will  require  an  evaluation  of  the 
performance  based  impact  of  a  code  port 
from  Ada  to  C++  (an  increasingly 
common  event)  for  the  acoustic 
processors  of  a  torpedo  control  system 
and  investigate  the  nature  of 
understanding  and  developing  such 


Background 


The  only  constant  is  change.  Nowhere  is 
this  more  evident  than  in  the  world  of 
software. 
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impact  based  evaluations.  The  criteria 
for  use  in  this  evaluation  are  Total 
Architectural  Change  (TAC),  a  top-level 
metric  that  describes  the  degree  of 
impact  the  code  port  will  have  on 
torpedo  performance,  and  Code  Change 
(CC),  a  working-level  metric  that 
describes  the  extent  of  change  within  the 
components.  TAC  and  CC  will  be  used 
to  determine  the  amount  and  type  of 
regression  testing  required  as  a  result  of 
the  code  port. 

Assumptions 

Only  a  code  port  is  being  tested.  There 
are  no  changes  in  the  host  platform, 
operating  system,  peripheral  interfaces, 
torpedo  body,  or  warhead  characteristics. 
If  this  assumption  is  not  true,  the  tests 
prescribed  by  this  report  will  not 
comprehensively  address  required 
regression  testing  of  the  torpedo. 

The  Ada  code  is  implemented  using 
object  oriented  (00)  methodology  (Ada 
95  or  later).  If  this  assumption  is  not 
true,  the  code  port  should  be  treated  as  if 
the  Total  Architectural  Change  (TAC)  is 
100%. 


T  estine  Requirements 
Risk  Areas 
Architecture 

Various  design  decisions  made  in  the 
development  of  the  software  (component 
communications,  queuing,  object 
hierarchy,  data  structures,  etc.)  for  the 
Ada  design  may  or  may  not  be 
appropriate  for  use  with  a  C++ 
implementation.  These  attributes  of  the 
architecture  must  be  analyzed  with 


respect  to  latency,  data  integrity, 
modifiability,  availability,  reliability, 
and  stability  to  determine  to  what  degree 
the  design  must  be  changed  in  order  to 
meet  the  functional  requirements  of  the 
ADCAP.  The  degree  of  change  in  the 
design  will  have  a  direct  impact  on  the 
changes  required  in  the  code. 

TAC  is  calculated  as  the  percentage  of 
Critical  Software  Components  (CSC’s) 
that  undergo  significant  change  as  a 
result  of  the  code  revision.  The  two 
main  tasks  in  calculating  TAC,  then,  are 
to  determine  at  what  resolution  to 
distinguish  CSC’s  and  to  identify  what  is 
a  significant  change. 

Division  of  CSC’s 

The  TAC  is  to  have  a  macroscopic  effect 
on  the  testing  requirements,  determining 
whether  full-scale  testing  of  the  system 
will  be  required  or  the  system  can  meet 
regression  criteria  through  gradated 
testing  based  on  code  change.  To 
achieve  this  behavior,  the  TAC  should 
vary  linearly  with  respect  to  change  in 
software  function.  Programmatically, 
applications  are  often  divided  into 
functional  code  segments  that  are 
designed  by  a  single  team  from  a  core 
set  of  algorithms.  These  segments  often 
have  one-to-one  interface  relationships 
which  are  stable  parts  of  the  architecture. 
These  properties  normally  lead  to  the 
linear  behavior  desired  for  TAC,  and 
thus  identify  program  functional  code 
segments  are  good  candidates  for  CSC’s. 

To  formally  validate  linearity, 
investigate  the  next  higher  and  lower 
possible  sets  of  code  divisions.  Linear 
behavior  will  be  exhibited  when 
changing  the  implementation  of  a  code 
segment  at  the  next  lower  level 
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necessitates  changing  the 

implementation  of  peer  code  segments. 
At  the  highest  level  at  which  linearity  is 
achieved,  a  change  in  one  code  segment 
does  not  necessitate  a  change  in  peer 
code  segments,  making  TAC  evaluated 
at  the  next  higher  level  of  division 
different  from  the  candidate  level. 

In  application,  most  types  of  code  never 
exhibit  strict  linearity  in  this  way,  so  an 
engineering  judgement  call  is  required  to 
determine  the  highest  level  at  which 
TAC  remains  stable  when  different 
segments  are  changed. 

Significant  Change 

There  are  various  differences  between 
the  Ada  and  C++  languages  that  will 
necessitate  functional  differences  in  the 
implementation  of  the  system.  TheDoD 
Requirements  for  High  Order  Computer 
Programming  Languages  (DOD  1976) 
should  be  used  to  gauge  functional 
changes  in  the  code.  If  the  Ada  code 
exploits  a  Steelman  requirement  not 
provided  in  C++,  the  code  should  be 
considered  significantly  changed.  Note 
that  the  Steelman  is  not  being  used  to 
determine  the  suitability  of  either 
language  for  the  intended  purpose,  but 
rather  to  provide  a  common  basis  of 
comparison  for  the  functional  aspects  of 
different  implementations  of  the  same 
design  in  the  two  languages. 

Code  Change 

Code  Change  should  be  accounted  at  the 
lowest  level  possible  for  the  application, 
in  this  case  along  the  domain  of  object 
methods  and  properties.  A  method  or 
property  that  requires  one  or  more 
significant  changes,  as  described  above, 
is  considered  changed.  The  measure  of 


object  change,  80,  is  the  number  of 
changed  methods  and  properties  divided 
by  the  total  number  of  methods  and 
properties. 

80  =changed  methods  +  changed  properties 
total  methods  +  total  properties 

The  first-degree  extent  of  change,  ext1,  is 
the  number  of  object  interfaces  that 
change  involves  divided  by  the  total 
number  of  object  interfaces.  An  object 
interface  is  defined  as  the  relationship 
between  two  objects  where  one  object 
activates  the  methods  or  queries  the 
properties  of  another. 

ext1  =  number  of  changed  object  interfaces 
total  number  of  obj  ect  interfaces 

The  Code  Change  within  the  software 
system,  A,  used  to  evaluate  risk  for  this 
code  port  is  the  product  of  the  total 
object  change  and  file  first-degree,  extent 
of  change: 

A  - 80  x  ext1 

The  total  code  change  will  determine 
gradation  of  testing  requirements  as 
appropriate  for  the  determined  level  of 
change. 

Types  of  Test 

The  purpose  of  determining  the  TAC 
and  CC  for  an  application  undergoing 
revision  is  to  direct  the  test  efforts  for 
that  system  to  achieve  maximum 
confidence  in  system  performance  while 
incurring  minimum  cost.  Listed  below 
is  a  representative  subset  of  different 
testing  techniques  (DMSO  2001)  that 
can  be  used  to  verily  the  regression 
performance  of  the  torpedo  controller 
after  the  revision  to  the  acoustic 
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functional  segment.  Each  type  of  test 
requires  different  resources  and  looks  at 
performance  in  different  ways. 

Functional  Component  Test 

A  full  Functional  Component  Test 
requires  testing  the  performance  of  each 
CSC  against  its  requirements.  A  partial 
Functional  Component  Test  requires 
testing  the  performance  of  CSCs  in 
which  an  object  has  undergone 
significant  change. 

Integration  Test 

An  Integration  Test  tests  the  interaction 
among  CSCs,  ensuring  functional 
correctness  and  the  ability  to  meet 
quality  requirements  (latency,  stability, 
reliability,  etc.)  of  the  design.  Thread 
testing  is  exemplary  of  an  integration 
test.  Functional  component  tests  along 
with  integration  tests  provide  the  basis 
for  acceptance  of  the  modified  system 
for  operational  test. 

Turing  Test 

In  a  Turing  test,  two  systems  are  run 
under  identical  conditions.  Subject 
matter  experts  compare  the 
performances  of  the  systems  in  an 
attempt  to  identify  any  distinctions 
between  the  systems.  For  the  purposes 
of  a  Turing  test  on  the  torpedo  code 
port,  the  Naval  Undersea  Warfare 
Center,  Newport  RI  Weapons 
Assessment  Facility  modeling  and 
simulation  application  could  be  an 
acceptable  test  environment.  The  test 
plan  for  such  a  test  would  specify  the 
required  number  of  tests,  initial 
conditions  of  the  test  scenarios, 
performance  metrics  to  be  compared, 
and  the  subjective  and  objective  Criteria 


for  comparison.  While  the  Turing  test  is 
required  to  produce  a  high  correlation 
between  the  systems’  performances,  the 
number  of  tests  conducted  determines 
the  confidence  of  the  result. 


The  Turing  test  can  only  be  used  to 
ensure  consistency  of  performance  and 
not  to  identify  decline  or  improvement  in 
performance.  If  the  code  port  results  in 
a  significant  improvement  in 
performance,  the  system  will  fail  the 
Turing  test.  Failure  of  the  Turing  test 
with  improved  performance  in  the 
simulated  environment  does  not  equate 
to  confidence  in  improved  performance 
in  the  real  world  environment.  Any 
failure  of  the  Turing  test  would  require 
in  water  testing  to  resolve  the  regression 
requirement  for  the  torpedo. 
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1  TAC  >  .2 

Full 

Mandatory 

Mandatory 

TAC  > 

.8  <  A 

Full 

Mandatory 

N/A 

Mandatory 

.2 

.6  < 

A  <.8 

Full 

Mandatory 

High 

Confidence 

Dependent 

.4  < 

A  <  .6 

Full 

Mandatory 

Low 

Confidence 

Dependent 

In  Water  Test 

The  In  Water  Test  is  standard 
COMOPTEVFOR  at  sea  testing.  If  the 
changes  in  the  system  due  to  the  code 
port  are  so  large  or  if  the  results  of  less 
robust  testing  are  confounded,  as 
described  above,  in  water  testing  will  be 
required  to  resolve  the  regression 
performance  issue. 

Determining  the  Level  of  Test 

By  accepting  a  test  method  that  takes  a 
more  focused  look  at  the  system  and 
uses  fewer  resources  (though  not 
necessarily  at  lower  organizational  cost), 
the  test  authority  accepts  greater  risk  of 
ambiguous  results  that  would  require 
further  testing. 

TAC  and  CC  can  be  used  to  provide  a 
high  confidence  estimate  of  the 
minimum  testing  requirements  for  a 
given  code  revision.  Total  Architectural 
Change  is  used  a  first  level  index  of  the 
operational  impact  of  the  code  revision. 
A  nominal  threshold  value  must  be 
determined,  above  which  a  full,  in-water 
test  is  required.  This  threshold  roughly 
equates  to  the  percentage  of  variability 
that  the  end  user  of  the  application  is 
willing  to  accept  between  the  old  system 
and  the  new  system  regression 
performance.  Thus,  if  the  torpedo  in 
question  had  a  50%  probability  of  hit 


and  the  customer  were  willing  to  accept 
±  5%  variability,  the  TAC  full  test 
threshold  would  be  .2  (that  is  the  range 
of  variability,  10%  is  .2  of  the  50% 
performance  characteristic). 

If  the  application  is  determined  to  be  a 
candidate  for  graduated  testing,  the 
degree  of  testing  should  be  determined 
by  the  degree  of  Code  Change.  A 
greater  CC  value  indicates  a  greater 
degree  of  change  with  respect  to  the 
operational  characteristics  of  the 
application.  Thus,  the  degree  and  type 
of  testing  required  to  validate  the  new 
software  against  regression  performance 
increases  with  increasing  CC.  A 
representative  table  showing  the  degree 
of  testing  required  is  given  above. 

Summary 

While  change  to  software  systems,  on 
the  whole,  is  difficult  to  describe,  it  is 
possible  to  quantify  the  impact  of  change 
to  software  on  different  programmatic 
activities.  With  respect  to  regression 
performance  testing,  the  change  to  a 
software  system  due  to  code  revision 
should  be  evaluated  at  the  macroscopic 
and  working  levels. 

The  macroscopic  evaluation  of  software 
change  is  based  upon  significant  change 
to  high  level  Critical  Software 
Components.  At  this  level  the  impact  of 
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change  is  proportional  to  the  changes  in 
the  system  design. 

The  working  level  evaluation  of  change 
is  based  upon  both  the  volume  of  change 
effected  in  the  code  and  the  degree  of 
interdependence  within  the  code. 
Change  to  a  few  lines  of  code  that  are 
heavily  accessed  by  the  rest  of  the 
application  is  weighted  as  much  as 
extensive  change  to  relatively 
independent  pieces  of  code. 

Both  the  macroscopic  and  the  working 
level  degrees  of  impact  need  to  be 
considered  when  determining  the  extent 
of  testing  required  to  validate  regression 
performance  of  the  application. 
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